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Overexposure and underexposure of items in the bank are serious problems 
in operational eomputerized adaptive testing (CAT) systems. These 
exposure problems might result in item eompromise, or point at a waste of 
investments. The exposure eontrol problem ean be viewed as a test assembly 
problem with multiple objeetives. Information in the test has to be 
maximized, item eompromise has to be minimized, and pool usage has to be 
optimized. In this paper, a multiple objeetives method is developed to deal 
with both types of exposure problems. In this method, exposure eontrol 
parameters based on observed exposure rates are implemented as weights 
for the information in the item seleetion proeedure. The method does not 
need time eonsuming simulation studies, and it ean be implemented 
eonditional on ability level. The method is eompared with Sympson Better 
method for exposure eontrol, with the Progressive method and with alpha- 
stratified testing. The results show that the method is sueeesslul in dealing 
with both kinds of exposure problems. 


In computerized adaptive testing (CAT), items are seleeted on-the-fly. 
Adaptive proeedures are used to select items with optimal measurement 
characteristics at the estimated ability level of examinees. CAT possesses 
the same advantages as other computer-based testing proeedures, like 
increased flexibility and eonneetion of administrative systems. Besides, for 
a CAT it also holds that test length ean be decreased by almost 40 pereent 
without deerease of measurement preeision, and examinees are no longer 
frustrated by items that are either too diffieult or too easy (see e.g. van der 
Linden, & Glas, 2000, Wainer, Dorans, Flaugher, Green, Mislevy, 
Steinberg, & Thissen, 1990). 

CAT systems are theoretieally based on the properties of item 
response theory (IRT). In IRT, person parameters and item parameters are 
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separated. The item parameters are supposed to be invariant for different 
values of the person parameters. Therefore, items ean be calibrated and the 
item parameters can be stored in item banks. From these item banks, items 
that provide most information at the estimated person parameter are 
selected. In many large scale testing programs, paper-and-pencil test have 
been replaced by CATs. For example for the Graduate Record Examination 
(GRE) and the Armed Services Vocational Aptitude Battery (ASVAB), 
CAT-versions are available now. 

CITO (National Institute of Educational Measurement) in the 
Netherlands administers several CATs, like MATHCAT (CITO, 1999), 
TURCAT (CITO, in press), DSEcat (CITO, 2002) and KindergartenCAT. 
MATHCAT is developed for diagnosing Mathematics deficiencies for 
college students (Verschoor, & Straetmans, 2000), TURCAT tests 
proficiency of Turkish as a second language, DSEcat tests Dutch as a 
Second Eanguage, and KindergartenCAT contains tests for measuring 
ordering, language, and orientation in time and space abilities of young 
children (Eggen, 2004). These CATs, like almost all operational CAT 
systems encounter an unevenly distributed use of items in the bank. 

In general, most item selection procedures favor some items above 
others, due to superior measurement properties or favorable item 
characteristics. As a result, some items are overexposed. This might result 
in item compromise, which undermines the validity of score-based 
inferences (Wise & Kingsbury, 2000). On the other hand, some items might 
be underexposed, which is a waste of investments. Therefore, choosing a 
strategy for controlling the exposure of items to examinees has become an 
integral part of test development (Davis & Dodd, 2003). 

In this paper, a multiple objectives exposure control method is 
proposed for dealing with problems of both overexposure and 
underexposure of the items. First, a theoretical background is given. Then, 
the new method is introduced. The performance of the method is evaluated 
in two studies. Finally, recommendations about the use of the new method 
are given. 


THEORETICAL BACKGROUND 

One of the first methods developed to deal with exposure control 
problems, is the 5-4-3-2-1 technique (Hetter, & Sympson, 1997, McBride, 
& Martin, 1983) applied in the CAT-ASVAB. This randomized procedure 
was developed to reduce probability of item sequences in the first five 
iterations of CAT. Kingsbury and Zara (1989) and Thomasson (1998) 
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developed different randomization methods aimed to reduee overall item 
exposure. Rotating item pool methods (Ariel, Veldkamp, and van der 
Linden, 2004, Way, 1998, Way, Steffen, and Anderson, 1998) and CAST 
(Lueeht & Nungester, 1998) were developed to spread the items over 
different tests by a priori redueing the availability of items for seleetion. 
However, in CAT industry item-exposure eontrol method based on the 
Sympson and Hetter method (1985) are most eommonly applied. 

Sympson-Hetter methods 

Although some variations exist, the general idea underlying these 
methods ean be deseribed as follows. To define these methods two events 
have to be distinguished, the event that item i is seleeted by the CAT 
algorithm and the event that item i is administered (Ai). The probability 
that event Ai oeeurs is the probability that Ai oeeurs given that Si has 
oeeurred times the probability that Si oeeurs: 


P(^) = P(4|5,)*P(5'/). (1) 

To eontrol the item exposure, one eould foeus on either of both 
probabilities. In the Sympson-Hetter methods, exposure control is 
conducted after an item is selected. The conditional probabilities V{Ai\ Si) 
are used as control parameters. These control parameters guide the 
probability experiment in which it is determined whether the selected item 
is administered or removed temporarily for the person tested from the pool. 

The idea underlying the method is that when r^ax is the target value 
for the maximum exposure rate, the conditional probabilities can be set in 
such way that P(^/) < r^ax- The procedure to find appropriate values for the 
control parameters is quite time consuming. In a series of iterative 
adjustment, the appropriate values can be found. 

These Sympson-Hetter methods suffer from several drawbacks. When 
the population is categorized based on ability, the exposure rates within sub 
groups might still be high. Time-consuming simulation studies have to be 
conducted for calculating the exposure control parameters. Moreover, the 
procedure for calculating the control parameters does not converge 
properly, and the claim that P(.d;) < r^ax holds, can not be validated (van der 
Linden, 2003). Finally, it is also known that the Sympson-Hetter method is 
hardly effective in dealing with underexposure problems. Underexposure 
refers to the problem that items in the pool are administered so seldom, that 
the expense for constructing them can not be justified. 
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Several improvements of the original proeedure have been developed. 
Stocking & Lewis (1998) proposed to conduct exposure control conditional 
on ability level, to overcome the problem of high exposure rates for specific 
ability levels. They defined the events in (1) conditional on ability level. 
The new relationship can be described as 


V{Ai\ej) = ?(Ai\ Su0j) * P(5',| 6j), j=l,..,J, (2) 

where J defines the number of ability levels to take into account. The time 
needed to calculate the exposure control parameters increases J times, 
because control parameters have to be calculated for all J ability parameters. 
When this new procedure is applied, exposure rates within subgroups of the 
ability scale will also be below the specified level. This modification solves 
one of the problems of the method, but convergence problems and loss of 
total test information still exists. 

Van der Linden (2003) proposed to modify the Sympson-Hetter 
method to speed up the iterative adjustment process to find the exposure 
control parameters. In the Sympson-Hetter method, the exposure parameters 
are adjusted with the following rule: 


P-‘(4|5,):= 


I ■ 

kx/p'(5,) 


ifP‘{S0<r^ 

ifP'(^,)>L 


(3) 


where t is the iteration number, and rmax is the desired target for the 
exposure parameters. The adjustment process can be speeded up by 
changing this rule into 


j 4(414) if4(4)<r_ 

if4(4)>r_ 


(4) 


where y is a parameter to increase the size of the adjustment. Although less 
time is needed for finding exposure control parameters, the process is still 
generally tedious and time-consuming, particularly if the control parameters 
have to be set conditionally on a set of realistic ability values for the 
population of examinees. 
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Barrada, Veldkamp & Olea (2009) modified the Sympson-Hetter 
approaeh by varying the exposure eontrol parameters throughout the test 
administration. To avoid that all items with high diseriminating power are 
seleeted when estimation of trait levels is still uneertain, low values for r^ax 
are imposed at the beginning of the test. The values of rmax inerease during 
CAT administration. So, highly diseriminating items are reserved for the 
later stages of the test. 

Eligibility methods 

Reeently, van der Linden and Veldkamp (2004, 2007) proposed to 
formulate the exposure eontrol problem as a problem of eonstrained test 
assembly. Like the Sympson-Hetter method a probabilistie algorithm is 
used. However, this method does not need time eonsuming simulation 
studies to find eontrol parameters for the probabilistie experiment. Based on 
the observed exposure rates, the algorithm determines whether item 
eligibility eonstraints are added to the model for seleeting the items in CAT. 
The method consists of several steps. First, a probability experiment is 
conducted to determine if an item is eligible. Second, ineligibility 
constraints are added to the test assembly model, and the model is solved. 
Three, if the addition of eligibility constraints leads to an infeasible model, 
the constraints are removed and the relaxed model is solved. The 
probability for an item of being eligible to examinee ij+V) can be expressed 
in terms of: 

8ij: number of examinees through j for whom item i has been eligible. 

ttij: number of examinees through j to whom item i has been 

administered. 


For examinee (/+1), item i is eligible with estimated probability: 




F V 

V max ^ \ 

a, 


( 5 ) 


with aij>0. For aij = 0, the probability of being eligible is defined to be 
= l.The method proved to perform well in dealing with 
(over)exposure of popular items in the bank. 

Both the (modified) Sympson-Hetter methods and the Eligibility 
methods mainly focus on overexposure of popular items in the pool. 
Although decrease of exposure rates of the most popular items results in 
some increase of exposure rates of less popular items, only exposure rates 



340 


B.P. Veldkamp, et al. 


of items with almost as favorable attributes as the most popular items 
increase. Unpopular items are still hardly selected. 


Methods for controlling underexposure 

For solving the problem of underexposure, different methods have 
been developed. Chang & Ying (1999) introduced a-stratified testing. In 
their approach, item pools are stratified with respect to values of their 
discrimination parameters a. The first items are chosen from the stratum 
with lowest a values. A second group of items are chosen from the 
subsequent stratum, and the last items in the test from the stratum with 
highest a values. This approach is based on the observation that estimates 
of the ability parameters are very unstable during the administration of the 
first few items of a CAT. Because of this, less discriminating items should 
be used in the earlier stages, while the most discriminating items should be 
used when estimates have been stabilized. The claim is that this approach 
would lead to a more balanced item exposure distribution and improve item 
pool utilization. Unfortunately, this method does not impose any bounds on 
exposure rates. Some observed exposure rates might be much higher than 
expected (Parshall, Kromrey, & Hogarty, 2000). Besides, the method is 
highly dependent on item bank properties. Usually, discrimination 
parameters are not uniformly distributed or the discrimination and the 
difficulty parameters might correlate positively. 

A different method for solving the problem of underexposure is 
based on the observation that exposure problems result from the item 
selection criterion that is applied. When items are selected that maximize 
Fisher’s Information criterion, items with high discrimination values tend to 
be selected more often than the others. One way to reduce both over- and 
underexposure is to add a random component to the item selection criterion. 
Revuelta and Ponsoda (1998) elaborated this idea in their Progressive 
method. When this method is applied, a random value Ri in the interval 
[0,H], where H is the maximum value of the information function, is 
assigned to each item in the bank. Items are selected based on a weighted 
combination of the random component and Fisher’s information criterion: 


n n 


( 6 ) 
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where the weighting faetor is determined by the serial position s of the item 
in the test, and the total test length n. For seleeting the first item, the value 
of the eriterion is dominated by the value of the random eomponent, while 
for seleeting the last item, the random component does not influence the 
criterion anymore. This method proved to be effective against 
underexposure, however, it is not conditional on ability level, and it can not 
be guaranteed that targets for exposure rates will be met. Another drawback 
is that items that are completely off target might be presented to a 
candidate. 

Dealing with exposure control problems in CAT is rather 
complicated. Although several promising methods have been developed, all 
of them seem to suffer from various drawbacks. Because of this, exposure 
control problems still exist. In most large scale testing systems, a rather 
pragmatic approach is used and a combination of over- and underexposure 
control methods is implemented. For example, in most CATs developed by 
CITO, a combination of the Sympson-Hetter method and a generalization of 
the Progressive method is implemented (Eggen, 2001). By implementing a 
combination of methods, an attempt is made both to maximize measurement 
accuracy, and to balance item pool usage. 


MULTIPLE OBJECTIVITY AND EXPOSURE CONTROL 

When an exposure control method is implemented, the test assembly 
problem can be formulated as an instance of multiple objective decision 
making (Veldkamp, 1999). The first objective is to assemble tests 
accordingly to the test specifications. In general, the amount of information 
in the test is maximized, while a number of constraints on test content, item 
format, word count or gender orientation of the items have to be met. The 
second objective in the process is related to exposure of the items. The 
objective is to obtain an evenly distributed use of items in the bank. The 
observation that the exposure control problem is a problem of multiple 
objectives in test assembly is the comer stone of the method presented in 
this paper. The main idea is that exposure control methods should represent 
this multiple-objectivity. 

Both objectives can be formulated in mathematical programming 
terms. The first objective can be formulated as: 
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max^/,(6')x. 

/=1 

subject to 

(categorical) 

i^S j 
I 

'^a^jX^<bj (quantitative) 

/=1 

(inter - item dependencies) 

te5g 


/ 

(test length) 

(=1 


{ 0 , 1 }, 


(V) 


where x, denotes whether an item is selected (x, = 1) or not (x, = 0). The 
information in the test is maximized. The first general constraint represents 
constraints like content or item type. The second constraint represents 
specifications related to quantitative attributes like word count or response 
times. The third constraint is formulated to deal with dependencies between 
items like enemies, but also item sets. In this way, the first objective can be 
obtained. 


To formulate the second objective is slightly more complicated. In 
van der Linden and Veidkamp (2007) it is shown that the following equality 
holds: 



( 8 ) 


where cpi is the observed exposure rate, and n represents the test length. 
Because of this, it suffices to minimize the maximum exposure rate to 
obtain an evenly distributed use of the items in the bank. Therefore, the 
second objective can be formulated as 


min max 


J^i + 

7+1 


(9) 


where j is the number of previously tested examinees. These two objectives 
might conflict. To maximize the amount of information in the test, highly 
discriminating items are often selected. On the other hand, to obtain an 
evenly distributed use of the bank, these popular items can not be 



Computarized Adaptative Testing 


343 


administered to all eandidates. It eomes down to the test assemblers 
preferenees, how to deal with these eonflieting objeetives. One method for 
dealing with multiple objeetive test assembly problems is to combine both 
the objectives in one single objective function, by using one of the 
objectives as a weighting function for the other (Veldkamp, 1999). When 
this method is applied to the exposure control problem, the information can 
be weighted with some function of the observed item exposure rates. The 
resulting objective of the test assembly problem can be formulated as: 

max^w(^,.)/.(6>)x,., (10) 


where w((pi) is a weighting function that represents the test assemblers 
preferences. 

Several weighting functions can be applied. For example, the function 
can be based on the observation that the use of popular items can be 
reduced by temporarily removing them from the pool of available items, 
until their observed exposure rate is smaller than r^ax (see Revuelta & 
Ponsoda, 1998). This weighting function is shown in Figure la. 

A second example is based on the observation that the use of 
unpopular items (tpi « r^ax) can be increased by increasing their weights. 
To boost the use of unpopular items, the weighting function might decrease 
for increasing exposure rates. This observation results in a weighting 
function shown in Figure lb. 

The third example is related to test fairness. Because expelling some 
items from administration for some students, as in the first and second 
weighting function, might not be considered fair, assigning a small weight 
for popular items {tpi > r^ax) reduces the probability that they are selected, 
but does not make them ineligible. Two weighting functions that combine 
observations two and three are shown in Figures Ic and Id. 

Moreover, the causes of over exposure can be taken into account 
when the weighting function is defined. The main cause of exposure 
problems lays in the amount of information provided by the item. Since the 
amount of information presented by an item is related to the squared 
discrimination of an item, a weighting function that takes the amount of 
information into account can be formulated as: 




( 11 ) 
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Figure 1. Weighting functions (weighting factor on y-axis and observed 
exposure rate on x-axis). 


In all these examples, a differenee is made between items that are 
overexposed {(pi > rmax) and those who are not {(pi < rmax)- For both intervals 
different weighting funetions ean be defined, based on a number of 
observations. However, the question remains whieh weighting funetion 
performs best for whieh interval. 

A systematie approaeh to answer this question would be to distinguish 
between both intervals and to see which function for which interval results 
in the best exposure control method. 
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NUMERICAL EXAMPLES 

A comparison study was carried out to judge the performanee of the 
multiple objeetive exposure eontrol method. Several settings of the method 
were eompared with the Sympson-Hetter method, the alpha-stratified 
method, randomized item seleetion, and CAT without exposure eontrol. In 
the first example, different weighting funetions were eompared. Different 
methods for exposure eontrol were eompared in Example 2. 


Example 1. 

To find the best settings for the multiple objeetive exposure eontrol 
method, several funetions were implemented. The items in the bank were 
ealibrated with the OPLM, a speeial version of the 2PLM, where the 
diserimination parameters are restrieted to be integer. The OPLM is the 
general IRT model underlying all CATs developed by CITO. The item bank 
eonsisted of 300 items. The test length of all CATs was set equal to 40 
items. Fisher’s Information eriterion was used to seleet the items. The 
ability was estimated with the Weighted maximum likelihood estimator 
(Warm, 1989), assuming that the item parameters are known. The initial 
estimate of the ability was set equal to zero. For all examples, 40000 
examinees were randomly sampled from a normal distribution. The 
maximum exposure rate rmax was set equal to r^ax = 0.30 in the examples. 
These settings most closely resembled the CITO eontext. 

To eompare the results, the following eriteria were applied. The 
performanee of the CAT was evaluated by taking both the bias and the root 
mean squared error (RMSE) into aeeount. 

bias = — , (12) 

P 

RMSE = -^ , (13) 


where p= runs over all persons. 

To eontrol for underexposure of the items, three different funetions 
were distinguished for tpt < r^ax- The first function does not eontrol for 
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underexposure of the items ( w^{(j)i) = 1 ). The seeond funetion tries to eontrol 
for underexposure by assigning deereasing weights when the observed 
exposure rate inereases. The funetion is defined sueh that the weight equals 
one for items that have not been administered yet 

= 0) = 1 ), and it linearly deereases, where the weight for items with 

observed exposure equal to rmax is set equal to a eonstant ( w, {(j)i = ) = c , 

where c « 1). The third funetion aims at the eauses of underexposure, and 
relates the weights to the inverse of the squared diserimination. 

For overexposure {cpi > r^ax), four different funetions where 
distinguished in this study. First, overexposure was not allowed 
{Wi((pi) = 0). In the seeond funetion, a small weight is assigned 
= c). In the third funetion, the weight linearly deereases, where the 
weight for items with observed exposure equal to rmax is set equal to a 
eonstant = r^^x) where e « 1), and the weight is set equal to 

zero when the observed exposure rate equals one = 1) = 0). The 

fourth funetion aims at the eauses of overexposure, and relates the weights 
to the inverse of the squared diserimination. In the examples, the weighting 
eonstant was set equal to c = 0.4. 

When the multiple objeetive exposure eontrol method is applied, any 
weighting funetion is a eombination of funetion for controlling 
underexposure and a function for controlling overexposure of the items. The 
weighting functions were compared for two different settings, rmax = 0.3. 
Since 40 items were selected from an item bank of 300 items, the lower 
bound for rmax equals 0.133. Resulting bias and RMSE for rmax = 0.3 are 
shown in Table 1 and Table 2. The exposure rates of the items are shown in 
Figure 2. 

With respect to functions controlling for overexposure, the results 
were more or less what we had expected. The conditions where no overlap 
was allowed resulted in highest values for the RMSE. Eowest values were 
obtained when small weights were assigned to overexposed items. Both 
adaptive functions ended up somewhere between them. An unexpected 
effect was that controlling for underexposure resulted in smaller RMSEs. 
This might be caused by an interaction between the composition of the item 
pool and the adaptive item selection process. 
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Table 1. Bias for different combinations of weighting functions for 
under- and overexposure. 


Underexposure 


Overexposure 

W,.(^,) = l 

wMi) = linear 


II 

o 

0.000 

0.000 

0.000 

II 

o 

0.000 

0.000 

0.001 

Wi{(j)i) = linear 

0.000 

0.001 

0.000 

(N 

1 

II 

0.000 

0.000 

0.000 


As can be seen in Table 1, the values for the resulting biases hardly 
differ from zero, and no significant differences between the conditions were 
found. 


Table 2, RMSEs for different combinations of weighting functions for 
under- and overexposure. 


Underexposure 


Overexposure 

W,.(^,) = l 

Wi{(!)i) = linear 

(N 

1 

II 

II 

o 

0.098 

0.094 

0.096 

II 

o 

0.094 

0.090 

0.090 

Wi{(j)i) = linear 

0.095 

0.091 

0.092 

(N 

1 

II 

0.096 

0.093 

0.093 
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Figure 2. Observed exposure for different settings of the multiple 
objective exposure control method fmax=0.30 
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The observed exposure rates are shown in Figure 2. This figure has to 
be read in the same way as both tables; the first row of the first column 
describes the results for the condition of no underexposure control 
= 1 , and no overexposure allowed = 0, etc.. 

For overexposure, the results were clear. The best results with respect 
to observed exposure rates were obtained when no overexposure was 
allowed (row 1). Allowing overexposed items to be used (rows 2-4) resulted 
in high overexposure of some popular items. These results can be explained 
by checking the weighting functions. Because the weighting functions just 
weight the information provided by an item, very informative items might 
still be selected when the difference in weights between overexposed and 
less popular items is small. The method of decreasing weights (row 3), 
resulted in smallest overexposure of the most popular items. 

For underexposure, the methods with decreasing weights (columns 2- 
3) performed best. They performed better than the cases were no 
underexposure control was applied (column 1). With respect to observed 
exposure rates no differences were found due to the way the weights 
decreased. 

Taking both RMSE and observed exposure rates into account, the best 
results were obtained in when no overexposure was allowed (row 1) and 
underexposure was being controlled for with linearly decreasing weights 
(column 2). 

Example 2. 

To evaluate the performance of the multiple objective exposure 
control method, it was compared with the alpha-stratified method, the 
Sympson-Hetter method, and the progressive method in combination with 
Sympson-Hetter. For the alpha-stratified method we used four strata. 
Stratum 1 contained 40% of the items in the bank. Stratum 2 also contained 
40% of the items. Stratum 3 had 15% of the items. Stratum 4 had only 5% 
of the items. During the test assembly process, the same percentages of 
items were selected from the strata. To add some benchmarks, both 
randomized item selection and item selection based on Fisher Information 
without exposure control were added to the example. In this comparison 
study, the weighting function that performed best with respect to bias, 
RMSE and observed exposure rates in the first study was applied. The 
resulting function combined a linear part to control for underexposure and a 
weight equal to zero to control for overexposure. For every exposure 
control method, 40000 CATs were simulated. The maximum exposure rates 
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were set equal to r^ax = 0.30 in these simulations. The results are shown in 
Table 3. 


Table 3. Performance different exposure control methods fmax = 0,30 


Method 

Bias 

RMSE 

no exposure control 

0.000 

0.086 

Multiple objective method 

0.000 

0.094 

Sympson-Hetter method 

0.000 

0.098 

Alpha-stratified method 

0.000 

0.109 

Progressive method (S-H) 

0.000 

0.097 

Randomized item 
selection 

0.001 

0.133 


When the results in Table 3 are eompared, it ean be observed that the 
different exposure eontrol methods did not result in any bias. Besides, the 
multiple objeetive exposure eontrol method resulted in smallest RMSE. 

The observed exposure rates are shown in Figure 3. It ean be seen that 
our implementation of the alpha-stratified method was not very suceessful 
in dealing with over-exposure. For some items the observed exposure rate 
exeeeded 0.40. A different stratifieation might have performed better, 
although we did not sueeeed in finding good settings. With respeet to 
underexposure eontrol, the alpha-stratified method performed best. For 
praetieal applieations, a eombination of the alpha-stratified method with the 
Sympson-Hetter method or the multiple objeetive method might be 
recommended. Almost no differences were found between the Sympson- 
Hetter method and the combination of the Progressive method and the 
Sympson-Hetter method. The progressive method performed slightly better 
with respect to underexposure. This implementation of the multiple 
objective exposure control method resulted in most items with maximum 
exposure rate. This also explains why this method resulted in smallest 
RMSF. 
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Figure 3. Observed exposure rates for multiple objectives (dotted), 
Sympson-Hetter (dasbed), Alpba-stratified (tbin), and Progressive 
(thick) exposure control. 


DISCUSSION 

Exposure control is applied to computer adaptive testing programs for 
several reasons. The most important reason is to prevent item compromise. 
A second reason is to increase the usage of the item pool. Until now, several 
exposure control methods have been developed that deal with the problem 
of over-exposure successfully. Under exposure of the items is still a 
problem in many adaptive testing programs. 

The multiple objective exposure control method was developed to 
deal with both kinds of exposure control problems. One of the advantages 
of the new method is that no time consuming simulation studies have to be 
carried out. The new method can be implemented ‘on the fly’. During the 
administration, the additional time for selecting an item with the multiple 
objective exposure control method was less than a millisecond. In the first 
example, it can be observed how the weighting functions influence the 
resulting tests. For example, the best results for the RMSE are obtained for 
an weighting function that allowed overexposure of some popular items. In 
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other words, the tradeoff between RMSE and observed exposure rates can 
be controlled by defining appropriate weighting functions. 

The multiple objective exposure control method was described as a 
deterministic method of exposure control. This implies that any 
administration of the test directly influence the weights for the next 
candidates. If such a dependency is undesirable, a probabilistic 
implementation might be considered. The weighting functions 

determine the probability for an item i to be selected. Before any CAT is 
administered, a probability experiment is carried out for every item to 
decide whether it is selected for the pool or not. For examinee j+\, item i is 
eligible, that means available for selection, with estimated probability 




( 14 ) 


where Ei denotes the event that item i is eligible. In the experiment, a 
random number u is drawn from the interval [0,1]. For w < P, the item is 
eligible, for u > P, the item is not elegible. This probability experiment is 
comparable to the one described in van der Finden & Veidkamp (2004). 
However, in this approach the test specialist can define the function that 
relates the observed exposure rates to the probability of being eligible. The 
result of this experiment is a subset of the item pool that can be used for test 
administration. 

Finally, since the multiple objective exposure control method is an 
interactive method where the parameters affecting the exposure control 
method are updated during the test administration period, some remarks 
have to be made about practical implementation. In a web-based 
environment, with testing over the internet, updating the parameters on-the- 
fly seems rather straightforward. However, when thousands of examinees 
participate in a test at the same time updating the parameters every few 
minutes instead of continuous updating might be considered. This will 
reduce the probability of crashing the web-server. When the method is 
applied in classroom setting, which is most common for CITO CATs, the 
exposure rates resulting from different locations can be combined 
periodically. 

When the method is applied to operational CATs, one of the first 
questions is to choose which weighting function to implement. In the first 
example, several weighting functions were compared for a given item bank. 
This example just illustrates the effects of controlling for underexposure 
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and the effects of allowing overexposure of some of the items. The resulting 
bias (Table 1), RMSE (Table 2), and observed exposure rates (Figure 2), 
can not be generalized beyond this example. However, based on theoretical 
arguments, a practitioner could choose between controlling for 

_2 

underexposure ( w. )= linear or Wi((pi) = ai ) or not controlling 
{Wi((f>i) = 1). The same kind of decision needs to be made about how strict 
the maximum exposure rate r^ax has to be imposed. A small simulation 
study (comparable to the one in Example 1) can be carried out to get a 
feeling about how the method might work for an operational CAT with a 
given item bank. Even although we in general recommend performing 
simulation studies before starting any operational CAT, this step is not a 
necessary requirement for the implementation of the multiple objective 
exposure control method. The initial observed exposure rates can be set 
equal to (tpi = 0) for all items, and the values of cpi can be updated after every 
test administration. 

The multiple objectives exposure control method has not been 
implemented in any commercial software package yet. It is generally 
applicable to CAT programs based on, for example, the Weigthed Deviation 
Model (Stocking, & Swanson, 1993) or the Shadow Test Approach (van der 
Einden, 2005). For this study, the method was implemented in CAT 
software developed at CITO in The Netherlands. For operational use, 
practitioners either have to add a module that calculates the weights for each 
item give the observed exposure rates to their CAT software, and to 
implement these weights in their item selection procedures, or they can 
contact the authors. 
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